Detection of copy number variation on wheat

Ricardo H. Ramirez-Gonzalez

28 Sept 2020

Detection of medium and large CNV

  • Copy number variations (CNVs) are important for functional analysis and gene specialisation
  • High througput sequencing across a large sets of samples

Normalisation

Sample_1 Sample_2 Sample_3 Sample_4
chr1:1:100 19 24 12 14
chr2:101:200 25 23 17 18
chr1:201:300 20 27 12 18
chr2:301:400 19 25 16 10
chr2:401:500 17 27 14 25

Normalisation

Sample_1 Sample_2 Sample_3 Sample_4
chr1:1:100 19 24 12 14
chr2:101:200 25 23 17 18
chr1:201:300 20 27 12 18
chr2:301:400 19 25 16 10
chr2:401:500 17 27 14 25

Initial data

  • Coverage across regions a region is not uniform

Normalisation

\(x_{i,j}=\frac{WindowCoverage_{i,j}\times10^{9}}{WindowLength_{i}\times{totalReadsSample_{j}}}\)

\(xnorm_{i,j}=\frac{x_{i,j}}{mean(X_{i})}\)

Normalise by sample

\(x_{i,j}=\frac{WindowCoverage_{i,j}\times10^{9}}{WindowLength_{i}\times{totalReadsSample_{j}}}\)

Sample_1 Sample_2 Sample_3 Sample_4
chr1:1:100 19 24 12 14
chr2:101:200 25 23 17 18
chr1:201:300 20 27 12 18
chr2:301:400 19 25 16 10
chr2:401:500 17 27 14 25
totalReadsSample 100 126 71 85

Normalisation by window

\(xnorm_{i,j}=\frac{x_{i,j}}{mean(X_{i})}\)

Sample_1 Sample_2 Sample_3 Sample_4 Window Mean
chr1:1:100 1,919,192 1,924,002 1,707,213 1,663,696 1,803,526
chr2:101:200 2,525,253 1,843,835 2,418,552 2,139,037 2,231,669
chr1:201:300 2,020,202 2,164,502 1,707,213 2,139,037 2,007,739
chr2:301:400 1,919,192 2,004,169 2,276,284 1,188,354 1,847,000
chr2:401:500 1,717,172 2,164,502 1,991,748 2,970,885 2,211,077

Normalised coverage

Sample_1 Sample_2 Sample_3 Sample_4
chr1:1:100 1.06 1.07 0.95 0.92
chr2:101:200 1.13 0.83 1.08 0.96
chr1:201:300 1.01 1.08 0.85 1.07
chr2:301:400 1.04 1.09 1.23 0.64
chr2:401:500 0.78 0.98 0.90 1.34

Normalised coverage

Second normalisation

  • Remove the windows with \(sd(window) > 0.3\)
  • Repeat normalisation
  • Exclude the datapoints without coverage

Watkins collection

  • 823 landraces sequenced >10X coverage
  • In collaboration with Griffiths group and Agricultural Genomics Institute of Shenzhen (AGIS)
  • Search for CNV within genes

Standard deviation of lines across samples (QC).

5 out of 823 lines are noisy (\(\sigma > 0.45\)).

line SD
WATDE0009 0.75
WATDE0039 1.08
WATDE0056 0.51
WATDE0060 0.52
WATDE0090 0.50

Regular vs noisy line

Regular vs noisy line

Zoom out to larger region

Outline represents \(1\pm2.5\sigma\)

Zoom out to larger region

Regular line (\(\sigma<0.45\))

Zoom out to larger region

Noisy line (\(\sigma > 0.45\))

Merging continuous CNVs

  • Stich individual windows with CNVs
  • Find the extent of a variation
  • Bridge over noisy windows

Stich CNV candidates

Stich CNV candidates

Stich CNV candidates

CNV length distributions

There are 43,412,060 CNV events across 823 lines

CNV length distributions (over 200bp)

  • The minimum size that we have in this dataset is 200bp 22,294,613 (51.36 %) are not singletons

Next steps

  • Improve stiching algorithm
  • Use smaller window size (150bp, 100bp)
  • Find genes/regions more prone to have CNVs
  • Analyse in detail known CNVs

Deletions in 𝛾-radiation lines

  • ~600 Paragon lines with radiation induced deletions
  • Identify lines with missing genes to test changes in phenotype
  • Low sequencing coverage (0.3x)
  • Large window sizes to make up for low coverage
  • Collaboration with Nichlson lab.

Normalised coverage in a line and region

Normalised coverage in a line and region

Normalised coverage in a line and region

Normalised coverage in a line and region

Normalised coverage in a line and region

Normalised coverage in a line and region

Deletions across genome

Paragon deletions website

http://wheat-deletion.cyverseuk.org

Next steps

  • Find genes without deletions
  • Find a minimal set of lines that cover all the genes that can be deleted

Acknowledgments

  • WatSeq: Simon Griffiths (JIC), Cheng Shifeng (AGIS), Burkhard Stauernagel (JIC), CiS
  • Paragon 𝛾-radiation lines: Paul Nichleson (JIC), Ben Hales (JIC), Anil Thanki (EI), Matt Clark (EI/NHM)